
DATA OBJECTS AND THE DATASET MACRO

DATA OBJECTS are created by the DATASET macro from a collection of variables. DATASET can also report the names of data objects and can be used for more advanced data manipulation as well. We discuss the macro in this help file.


_____________________________________

CREATING DATA OBJECTS
_____________________________________

DATASET is used to create a data object from a collection of variables. This is done by typing:
   (DATASET NAME VAR1 VAR2 ...)
where NAME is the name of the new data object being created, and VAR1 VAR2, etc, are the names of numeric variables. When the variables are not all numeric, you must include the :TYPES keyword, followed by the types:
   (DATASET NAME VAR1 VAR2 :TYPES (NUMERIC CATEGORY))

The variables must all have the same number of observations. You may use long or short names (see the ABOUT VARIABLES AND DATA help file). While short names are easier to type, they introduce the possibility of duplicate variable names. If there are duplicates, the most recently created variable is usually used, though ambiguity is possible. Long names, while more cumbersome, prevent the possibility of duplicate names. The variables may be "free" variables (i.e., those which were created by ViVa or VAR) or "data" variables (those already in data objects). Variables which are in more than one data object are distinct variables: Changes made to one will not cause changes in the other.

To find out what variables are available to form a new data object, type 
   NAME        INFORMATION
   $vars       list of variables in the current data
   $all-vars   list of all variables
   $data-vars  list of all data object variables
   $free-vars  list of all free variables
   $NAME-vars  list of all variables in data object NAME

You may label your observations by using the :LABELS keyword, which must be followed by a list of strings specifying observation names ("Obs1", "Obs2", etc., by default). 
   (DATASET NAME VAR1 VAR2 
            :TYPES (NUMERIC CATEGORY)
            :LABELS ("A" "B" "C"))

If the data are frequency data, you must use the :FREQ keyword followed by T (for true) to specify that the values of the numeric variables are frequencies. For example 
   (dataset freqdata frequency center treatment
            :types (numeric category category)
            :freqs t)

You may use the :ABOUT keyword, followed by an optional string of information about the data.

Given the arguments discussed above you can specify:
1) MULTIVARIATE data are data which are not one of the other data types given below. These data include univariate (one variable) and bivariate (two variables) data.
2) CATEGORY data have one or more CATEGORY variables and no NUMERIC or ORDINAL variables. The N category variables define an n-way classification.
3) CLASSIFICATION data have one NUMERIC variable and one or more CATEGORY variables. The N category variables define an n-way classification. The numeric variable specifies an observation for a given classification. 
4) FREQUENCY CLASSIFICATION data are classification data whose numeric variable specifies frequencies as indicated by using FREQ. The N category variables define an n-way classification, with the numeric variable specifying the co-occurance frequency of a specific combination of categories.


____________________________________

ADVANCED USES: FREQUENCY TABLE DATA AND MATRIX DATA 
_____________________________________

Frequency table data are data whose observations and variables are used to form the rows and columns of a two-way table. That is, the data are a two-way cross tabulation of the co-occurance frequency formed from the observations and variables. The data elements must be frequencies. The variables must be NUMERIC and :FREQ T must be specified. In addition, :ROW-LABEL and :COLUMN-LABEL must be used. Each must be followed by a string. The string is used to label the rows or columns of the table.

MATRIX data are data whose observations and variables refer to the same things. These things are used to form a square, usually symmetric matrix with the same number of rows and columns, the rows and columns identifying the same things. The values might be correlations, covariances, distances, etc. Optionally, there can be more than one matrix in a given data object. All matrices must have rows and columns identifying the same things. The keyword :MATRICES, used only for matrix data, must be followed by a list of strings specifying matrix names (and, indirectly, the number of matrices). :SHAPES, optional for matrix data only, must be a list of strings "symmetric" or "asymmetric" (case ignored), specifying the shape of each matrix (all are symmetric by default).


_____________________________________

THE LONG-FORM OF THE DATASET MACRO
_____________________________________

The complete "long-form" of the DATASET function creates a new data object from information contained within the DATASET statement. The minimum required syntax is:

   (DATASET NAME 
            :VARIABLES (VARLIST) 
            :DATA (DATALIST) )

For example:

   (dataset example 
            :variables (abc def) 
            :data (1 2 3 4 5 6))

The required arguments are discussed next, with optional long-form arguments following:

REQUIRED ARGUMENTS: 
NAME &KEY :DATA :VARIABLES

  NAME must be a string or a symbol. This is the name of the newly defined data object.
  
  :VARIABLES must be followed by a list of strings or symbols defining variable names (and, indirectly, the number of variables). 

  :DATA must be followed by a list of numbers, strings or symbols (symbols are converted to uppercase strings). The number of data elements must conform to the information in the other arguments.

GENERAL OPTIONAL ARGUMENTS: 
&KEY :TYPES :LABELS :FREQ :ABOUT 

  :TYPES must be followed by a list of strings "numeric", "ordinal" or "category" (case ignored), or symbols (same as strings, but no quotes) specifying whether the variables are numeric, ordinal or categorical (all numeric by default). 

  :LABELS must be followed by a list of strings specifying observation names ("Obs1", "Obs2", etc., by default). 

  :FREQ must be followed by T to specify that the values of the numeric variables are frequencies. 

  :ABOUT is followed by an optional string of information about the data.



_____________________________________

ADDITIONAL USES FOR THE DATASET MACRO
_____________________________________

The dataset macro may also be used to find out information about data objects. In particular, to see a list of all data objects, type:
   (DATASET)
and to see the object identification of data object NAME, type:
   (DATASET NAME)

Finally, you can create a new data object from a program contained within the DATASET macro by typeing:
   (DATASET NAME FORM)
where NAME is the name of the new data object, and FORM is a Lisp form, as described by Tierney (1990) or Steele (1990).
